Automatic Identification of AltLexes using Monolingual Parallel Corpora
نویسندگان
چکیده
The automatic identification of discourse relations is still a challenging task in natural language processing. Discourse connectives, such as since or but, are the most informative cues to identify explicit relations; however discourse parsers typically use a closed inventory of such connectives. As a result, discourse relations signaled by markers outside these inventories (i.e. AltLexes) are not detected as effectively. In this paper, we propose a novel method to leverage parallel corpora in text simplification and lexical resources to automatically identify alternative lexicalizations that signal discourse relation. When applied to the SimpleWikipedia and Newsela corpora along with WordNet and the PPDB, the method allowed the automatic discovery of 91 AltLexes.
منابع مشابه
A Preliminary Study of Finding Entailing Texts in a Domain-specific Monolingual Parallel Corpora
This paper introduces the possible usages, benefits, and challenges involved in the use of domain-specific monolingual parallel corpora in determining textual entailment (TE). A system that finds entailing text for a given statement is to be developed using monolingual parallel translations of the Bible as corpus as this is one of the most accessible monolingual parallel corpora. Different exis...
متن کاملParaphrasing with Bilingual Parallel Corpora
Previous work has used monolingual parallel corpora to extract and generate paraphrases. We show that this task can be done using bilingual parallel corpora, a much more commonly available resource. Using alignment techniques from phrasebased statistical machine translation, we show how paraphrases in one language can be identified using a phrase in another language as a pivot. We define a para...
متن کاملFinding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity
There have been many proposals to extract semantically related words using measures of distributional similarity, but these typically are not able to distinguish between synonyms and other types of semantically related words such as antonyms, (co)hyponyms and hypernyms. We present a method based on automatic word alignment of parallel corpora consisting of documents translated into multiple lan...
متن کاملCollocation Translation Acquisition Using Monolingual Corpora
Collocation translation is important for machine translation and many other NLP tasks. Unlike previous methods using bilingual parallel corpora, this paper presents a new method for acquiring collocation translations by making use of monolingual corpora and linguistic knowledge. First, dependency triples are extracted from Chinese and English corpora with dependency parsers. Then, a dependency ...
متن کاملA Particle Swarm Optimizer to Cluster Parallel Spanish-English Short-text Corpora Un Optimizador basado en Cúmulo de Part́ıculas para el Agrupamiento de Textos Cortos de Colecciones Paralelas en Español-Inglés
Short-texts clustering is currently an important research area because of its applicability to web information retrieval, text summarization and text mining. These texts are often available in different languages and parallel multilingual corpora. Some previous works have demonstrated the effectiveness of a discrete Particle Swarm Optimizer algorithm, named CLUDIPSO, for clustering monolingual ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017